home *** CD-ROM | disk | FTP | other *** search
-
- ARM Assembler
- *************
-
-
- Using the assembler
- ===================
-
- Coding in ARM assembler is very straightforward. If you have used an
- ARM Assembler before, you will already know the instructions.
- Otherwise I recommend
-
- 'Acorn Risc Machine Family Data Manual, Prentice Hall, ISBN 0-13-781618-9'
-
- for further information. It tells you everything about the ARM cpu
- you should know and covers the whole instruction set plus lots of
- hardware details. In fact it was my only source of information at
- hand when starting this RISC OS Forthmacs port.
-
- This documentation is by far not complete, but it covers most aspects.
- If you are writing code, just have a look at some kernel sources and
- see how it works. Whenever you are not sure about the produced code,
- have a look at it by
- code demo ...... c;
- see demo
- and you have the code just in front of you.
-
- Also there is a chapter "Assembler Tutorial".
-
- As most Forth assemblers are, this assembler is really just a
- vocabulary which contains the words for assembling ARM code. It is
- "activated" by adding the assembler vocabulary to the search order.
- There are also some common ways to control assembly which do more than
- just put the assembler vocabulary in the search order. It also uses a
- 'data first - operand last' syntax as Forth generally does.
-
- Lets now have a look at some kernel source and see what the syntax
- looks like in the forth assembler syntax and in the original Acorn
- Syntax ( displayed by the disassembler utility).
-
- code count (s adr -- adr1 cnt )
- r0 top mov
- top r0 byte )+ ldr
- r0 sp push c;
- see count
- code count
- ( a148 ) mov r0,r10
- ( a14c ) ldrb r10,[r0],#1
- ( a150 ) str r0,[r13,#-4]!
- ( a154 ) ldr pc,[r8],#4
-
-
- General syntax
- ==============
-
- All instructions follow the general syntax:
-
- ARM: opcode r-dest r-n operand
- Forth: r-dest r-nsrc operand modifiers condition op-code
-
-
- The brackets and commas in the original assembler source are replaced
- by spaces, 'addressing mode indicators' and macros. The
-
- ia [r0],#4 will be )+ , indicating a postincrement by 1 or 4 according
- to byte/word access.
-
- push is a macro meaning
- -( str.
-
- c; at the end assembles a next instruction
- ldr pc,[r8],#4
- pc ip )+ ldr
- and quits assembling.
-
- The operands ( registers or numbers ) must appear in the correct order
- followed by modifiers.
-
-
- Conditions
- ==========
-
- All instructions can be conditionally executed on ARM cpus. All
- condition codes are implemented, they should be preferably written
- just before the opcode itself. You don't have to write down the AL
- condition, it is the default.
-
- Note: According to ARM standards, NV is NOT implemented and should
- never be used because of future instruction set extensions.
-
- Condition codes available :
- EQ NE CS CC MI PL VS VC HI LS GE LT GT LE AL
-
-
- Shifts
- ======
-
- There a numerous shifts for operators available,
-
- ASL #ASL LSL #LSL LSR #LSR ASR #ASR ROR #ROR RRX ,
-
- all shift operator leaded by a # mean count of shift specified by a
- number, otherwise by a register.
-
- This assembler is clever enough to find out shifted immediates itself,
- so you don't have to worry about lines like
- top th f0 # td 24 #lsl mov
- just write
- top th f0000000 # mov
- instead.
-
-
- Register usage
- ==============
-
- Registers R0 - R6 are available for use within code definitions.
- Don't try to use them for permanent storage, because they are used by
- many code words with no attempt to preserve the previous contents.
-
- r7 floating stack pointer fsp
- r8 instruction pointer ip
- r9 user area pointer up
- r10 top-of-stack register top
- r11 return stack pointer rp
- r12 RISC OS frame pointer fp never touch this
- r13 stack pointer sp
- r14 link register lk
- r15 pc + status + flags pc
- Note: In future CPU Versions, the internal structure of the PC
- register will be different, it seems to be better, to imagine PC and
- status register as two registers. The hardware-errors and the
- .REGISTERS instruction know about this already.
-
-
- Structured programming
- ======================
-
- This assembler supports structured programming not by using labels but
- common forth-like structures instead. The structures do not have to
- fit on one line, and they may be nested to any level. The range of
- the branches assembled by these structures is not restricted.
-
- Implemented structures are:
- set the flags \ produce the condition
- condition if ... \ if condition is met do this
- else ... \ otherwise this
- then
-
-
-
- begin ....
- set the flags \ produce the condition
- condition while ... \ do this when condition met
- ( you may set the flags )
- ( condition ) repeat \ the repeat is normally always done
- \ but you may also test for another
- \ condition.
-
-
- begin ...
- set the flags \ produce the condition
- condition until \ leave the loop when condition is met
-
-
-
- begin ... again \ loop until whatever may happen
-
-
- Porting
- =======
-
- The ARM assembler can be used also by other Forth systems, all
- hardware specific parts are written portable and can be changed in
- case of problems very easily. So a 68k-Forthmacs can metacompile ARM
- code by this assembler without any change. In fact, the very first
- metacompilation of this RISC OS Forthmacs took place on an ATARI-ST
- having 1MB Ram and a 720k disk.
-
-
- Byte-sex
- ========
-
- Both byte-sexes can be produced by this assembler, this allows
- portable assembler code for all ARM CPUs. LITTLE-ENDIAN and
- BIG-ENDIAN do the switch.
-
-
- ARM2/3/6
- ========
-
- The assembler takes care of some cpu dependent restrictions, ARM2
- disallows the more advanced instructions, ARM3 allows them.
-
-
- Forth Virtual Machine Considerations
- ====================================
-
- The Forth parameter stack is implemented with r13, but the name SP
- should be used instead of r13, in case the virtual machine
- implementation should change.
-
- The return stack is implemented with r11, and the name RP should be
- used to refer to it.
-
- The base address of the user area ( the user pointer) is r9 but should
- be referred to as UP. User variable number 124 (for instance) may be
- accessed with the
- up td 124 d)
- addressing mode. There is a macro 'USER which will assemble this
- addressing mode for you.
-
- The interpreter pointer IP is r8. The interpreter is
- post-incrementing, so when a code definition is being executed, IP
- points to the token after the one being executed. A "token" is the
- number that is compiled into the dictionary for each Forth word in a
- definitions. For RISC OS Forthmacs, a token is a 32-bit absolute
- address.
-
-
- Assembler Glossary
- ===================
-
-
- ________________________________________________________________________
- PC ( -- n )
- portable name for the PC register
-
-
- ________________________________________________________________________
- SP ( -- n )
- portable name for the stack pointer
-
-
- ________________________________________________________________________
- FSP ( -- n )
- portable name for the floating stack pointer
-
-
- ________________________________________________________________________
- UP ( -- n )
- portable name for the user pointer
-
-
- ________________________________________________________________________
- IP ( -- n )
- portable name for the instruction pointer
-
-
- ________________________________________________________________________
- RP ( -- n )
- portable name for the return stack pointer
-
-
- ________________________________________________________________________
- TOP ( -- n )
- portable name for the top of stack register
-
-
- ________________________________________________________________________
- 'USER ( -- ) 'name'
- Executed in the form:
- top 'user <name> ldr
- <name> is the name of a User variable. Assembles the appropriate
- addressing mode for accessing that User variable.
-
- In RISC OS Forthmacs, the addressing mode for User variables is
- up #n d)
- where #n is the offset of that variable within the User area.
-
-
- ________________________________________________________________________
- ;CODE ( -- ) C,I
- semi-colon-code
- ( -- )
- Used in the form:
- : <name> ... create ... ;code ... c; (or end-code)
- Stops compilation, terminates the defining word <name>, executes
- ASSEMBLER, and does DO-ENTERCODE.
-
- When <name> is later executed in the form:
- <name> <new-name>
- to define the word <new-name>, the later execution of <new-name> will
- cause the machine code sequence following the ;CODE to be executed.
-
- This is analogous to DOES>, except that the behavior of the defined
- words <word-name> is specified in assembly language instead of
- high-level Forth.
-
- ;CODE calls DO-ENTERCODE, this is implementation specific and
- assembles the code needed to start the assembler code with the body of
- the defined word in TOP
- top sp push
- top lk th fc000003 # bic
-
- Note for specialists: You may do
- ;code
- -8 ass-allot
- ...
- and handle the link register and stack on your own which can be
- somewhat faster.
-
- See: CODE DOES>
-
-
- ________________________________________________________________________
- ADR ( rx addr -- )
- Assembler macro with the following effect:
-
- addr is moved to register rx. Within short distances this is achieved
- by a PCR instruction, otherwise it's more complicated.
-
- Note: The address will be relocated correctly!
-
-
- ________________________________________________________________________
- ALIGNING? ( -- addr )
- variable holding flag, true means assembler does aligning on its own.
- Implemented for CPU independent metacompiling.
-
-
- ________________________________________________________________________
- ALU-INSTRUCTIONS ( r-dest r-op1 op2{r-op2|imm} -- )
- Available instructions with this syntax:
-
- AND EOR SUB RSB ADD ADC SBC RSC TST TEQ CMP CMN ORR BIC
-
- These instructions all have two data-inputs to the alu, the register
- r-op1 and the operand op2. This can be another register or an 8-bit
- immediate.
-
- The register r-op2 can be "shifted" in any way specified by a shift
- specifier, either a 5-bit integer or another register plus the shifted
- register. The immediate operand can be rotated right by
- 2*(4-bit-integer).
-
- If you give "large" literals as arguments, the assembler will generate
- the correct shifts itself.
-
- The # modifier declares an immediate operand as in: \ top r0 3 # add
-
- The S modifier will set the flags according to the result, the
- instruction will be ADDS instead of ADD .
-
- MOV and MVN are somewhat different, the operand r-op1 isn't needed.
- Also, both can handle "big" immediates themselves,
- top th 12345678 # mov
- won't be a problem, MOV assembles all instructions needed.
-
- CMP and (Fcmn) can both handle negative immediate operandes, they try
- to find out which operand is possible.
-
-
- ________________________________________________________________________
- ASS-ALLOT ( n -- ) deferred
- Allocates n bytes in the dictionary. The address of the next
- available dictionary location is adjusted accordingly.
-
- default ALLOT, implemented for ( cpu independent ) metacompiling.
-
-
- ________________________________________________________________________
- ASSEMBLER ( -- )
- Execution replaces the first vocabulary in the search order with the
- ASSEMBLER vocabulary, making all the assembler words accessible.
-
-
- ________________________________________________________________________
- BIG-ENDIAN ( -- )
- Switches assembler to big-endian target code
-
-
- ________________________________________________________________________
- BRANCH ( addr -- )
- Assembles a branch instruction to here. Can be modified by DOLINK and
- all condition codes.
-
-
- ________________________________________________________________________
- BYTE ( -- )
- modifier for the assembler, memory accesses mean byte wide access
-
-
- ________________________________________________________________________
- CODE ( -- ) 'name' M
- A defining word executed in the form:
- code <name> ... end-code or c;
- Creates a dictionary entry for <name> to be defined by a following
- sequence of assembly language words. Words thus defined are called
- code definitions or primitives. Executes ASSEMBLER and sets the
- opcode defaults .
-
- This is the most common way to begin assembly.
-
-
- See: END-CODE C;
-
-
- ________________________________________________________________________
- CODE! ( n addr -- ) Deferred
- Stores a 32-bit word into the code at addr.
-
- This word is deferred so that the metacompiler may change it to
- assemble code into the target dictionary rather than the resident
- dictionary. It also handles little/big endian target code.
-
-
- ________________________________________________________________________
- CODE, ( n -- ) Deferred
- Places n in the dictionary at ( assemblers ) HERE and ASS-ALLOTs
- enough space for a word.
-
- This word is deferred so that the metacompiler may change it to
- assemble code into the target dictionary rather than the resident
- dictionary. It also handles little/big endian target code.
-
-
- ________________________________________________________________________
- C; ( -- )
- c-semi-colon
- Terminates the current code definition and allows its name to be found
- in the dictionary.
-
- Sets the CONTEXT vocabulary to be same as the CURRENT vocabulary
- (which removes the ASSEMBLER vocabulary from the search order, unless
- you have explicitly done something funny to the search order while
- assembling the code).
-
- Executes NEXT to assemble the "next" routine at then end of the code
- word word being defined. The "next" routine causes the Forth
- interpreter to continue execution with the next word.
-
-
- This is the most common way to end assembly, calls END-CODE.
-
-
- ________________________________________________________________________
- CONDITIONS ( -- )
- All instruction are executed only if the correct condition is met, the
- assemblers default is AL (always), but these are also available:
-
- EQ NE CS CC MI PL VS VC HI LS GE LT GT LE AL
-
-
- ________________________________________________________________________
- DECR ( reg n# -- )
- Macro, n# will be subtracted from reg.
-
-
- ________________________________________________________________________
- DOLINK ( -- )
- modifier for BRANCH instruction, the current pc will be saved to the
- link register.
-
-
- ________________________________________________________________________
- END-CODE ( -- )
- Terminates a code definition and allows the <name> of the
- corresponding code definition to be found in the dictionary.
-
- The CONTEXT vocabulary is set to the same as the CURRENT vocabulary
- (which removes the ASSEMBLER vocabulary from the search order, unless
- you have explicitly done something funny to the search order while
- assembling the code).
-
- The NEXT routine is not automatically added to the end of the code
- definition. Usually you want NEXT to be at the end of the definition,
- but sometimes the last thing in the definition is a branch to
- somewhere else, so the NEXT at the end is not needed.
-
-
- See: C;
-
-
- ________________________________________________________________________
- ENTERCODE ( -- )
- Starts assembling after stack checking, setting the assembler defaults
- and switching to ASSEMBLER.
-
-
- ________________________________________________________________________
- GET-LINK ( -- reg -- )
- Assembler macro, equivalent for:
- lk fc000003 # bic
- this is useful to get the address after a branch instruction.
- xxxxx dolink branch ---+
- A) data ... |
- |
- |
- B) top get-link <----+
- So after branching to B), TOP will be set to A)
-
-
- ________________________________________________________________________
- INCR ( reg n# -- )
- Macro, n# will be added to reg.
-
-
- ________________________________________________________________________
- LABEL ( -- ) 'name' F83
- A defining word used in the form:
- label <name> ... end-code
- label <name> ... c;
- Creates a dictionary entry for <name> consisting of a following
- sequence of assembly language words. When <name> is later executed,
- the address of the first word of the assembly language sequence is
- left on the stack.
-
-
- See: END-CODE
-
-
- ________________________________________________________________________
- LDM ( rx1 rx2 .. rxn n# r-adr -- )
- Load multiple registers from the address pointed to by r-adr, an
- addressing modes must be defined.
-
- The register list is given by all register names (don't name a
- register twice) and the number of registers.
- r0 r1 r2 r3 4 sp ia! ldm
- This loads registers r0-r3 from the stack and sets the stack pointer
- to the next stack entry.
-
-
- See: LDR STM
-
-
- ________________________________________________________________________
- LDR ( r-data r-adr operand2 -- )
- r-data is read from memory, the default is word (32-bit) wide, but the
- modifier BYTE sets this byte-wide access.
-
- The address is calculated using r-adr and the operand2. It can be
- another register (the shift specified as usual by a 5-bit literal and
- a shift type) or a 12-bit immediate offset.
-
- operand2 can be added to or subtracted from r-adr according to the
- addressing mode defined by two letters. The first tells whether
- (i)ncreasing or (d)decreasing should be used, the second whether the
- in/decreasing takes place (b)efore or (a)fter the memory access. A
- "!" at the end tells "write-back" will take place. So these modes are
- possible
- da ia db ib \ decrease/increase after/before
- da ia! db! ib! \ as above plus write-back
-
-
- Some macros make live a bit more easy, they are somewhat 68k alike,
- and must follow a BYTE modifier because an offset will be calculated
- by the assembler itself.
-
- : ) 0 # ib ;
- : )+ @increment ia ;
- : )- @increment da ;
- : -( @increment db! ;
- : +( @increment ib! ;
-
- : d) dup abs # offset? swap 0< if db else ib then ;
- : d)! dup abs # offset? swap 0< if db! else ib! then ;
- : push -( str ;
- : pop )+ ldr ;
- Examples:
- top r6 byte )+ ldr
- top up 8 d) ldr
-
-
- See: STR
-
-
- ________________________________________________________________________
- LITTLE-ENDIAN ( -- )
- Switches assembler to little-endian target code
-
-
- ________________________________________________________________________
- MLA ( r-dest r-op1 r-op2 )
- Assembles a multiply-and-accumulate instruction.
-
-
- ________________________________________________________________________
- MUL ( r-dest r-op1 r-op2 )
- Assembles a multiply instruction.
-
-
- ________________________________________________________________________
- NEXT ( -- )
- Assembler macro which assembles the NEXT routine, which is the Forth
- address interpreter.
-
- In RISC OS Forthmacs this is one single instruction.
- pc ip )+ ldr
-
-
- ________________________________________________________________________
- NOP ( -- )
- Assembler macro, equivalent to
- r0 r0 mov
-
-
- ________________________________________________________________________
- PCR ( addr -- pc offset )
- Assembler macro, expects an address on the stack and calculates its
- address offset from PC. The addressing mode is also set.
-
-
- ________________________________________________________________________
- RETURN ( -- )
- macro for
- pc lk mov
-
-
- ________________________________________________________________________
- S ( -- )
- modifier, the instruction will set the flags according to the result.
- default for tst, teq tstp teqp cmp cmn cmpp cmnp.
-
-
- ________________________________________________________________________
- STM ( rx1 rx2 .. rxn n# r-adr -- )
- Store multiple registers to the address pointed to by r-adr, an
- addressing modes must be defined.
-
-
- See: LDM for more details.
-
-
- ________________________________________________________________________
- STR ( r-data r-adr operand2 -- )
- r-data is stored to memory, the default is word (32-bit) wide, but the
- modifier BYTE sets this byte-wide access.
-
-
- See: LDR
-
-
- ________________________________________________________________________
- SWI ( swi# -- )
- assembles a swi instruction, the number is swi#.
-
-
- ________________________________________________________________________
- SWIX ( swi# -- )
- assembles a swix instruction, the number is swi#.
-
-
- ________________________________________________________________________
- SWP ( r-dest r-base r-source -- )
- assembles a swp instruction if Arm3-code is allowed by ARM3
-
-
- ________________________________________________________________________
- T ( -- )
- modifier, force -T pin.
-
-
- ________________________________________________________________________
- ^ ( -- )
- modifier, force access to user-mode registers.
-
-